Tochastic Gradient Descent
نویسندگان
چکیده
Mini-batch based Stochastic Gradient Descent(SGD) has been widely used to train deep neural networks efficiently. In this paper, we design a general framework to automatically and adaptively select training data for SGD. The framework is based on neural networks and we call it Neural Data Filter (NDF). In Neural Data Filter, the whole training process of the original neural network is monitored and supervised by a deep reinforcement network, which controls whether to filter some data in sequentially arrived mini-batches so as to maximize future accumulative reward (e.g., validation accuracy). The SGD process accompanied with NDF is able to use less data and converge faster while achieving comparable accuracy as the standard SGD trained on the full dataset. Our experiments show that NDF bootstraps SGD training for different neural network models including Multi Layer Perceptron Network and Recurrent Neural Network trained on various types of tasks including image classification and text understanding.
منابع مشابه
Mproving S Tochastic G Radient D Escent with F Eedback
In this paper we propose a simple and efficient method for improving stochastic gradient descent methods by using feedback from the objective function. The method tracks the relative changes in the objective function with a running average, and uses it to adaptively tune the learning rate in stochastic gradient descent. We specifically apply this idea to modify Adam, a popular algorithm for tra...
متن کاملAn eigenvalue study on the sufficient descent property of a modified Polak-Ribière-Polyak conjugate gradient method
Based on an eigenvalue analysis, a new proof for the sufficient descent property of the modified Polak-Ribière-Polyak conjugate gradient method proposed by Yu et al. is presented.
متن کاملExtensions of the Hestenes-Stiefel and Polak-Ribiere-Polyak conjugate gradient methods with sufficient descent property
Using search directions of a recent class of three--term conjugate gradient methods, modified versions of the Hestenes-Stiefel and Polak-Ribiere-Polyak methods are proposed which satisfy the sufficient descent condition. The methods are shown to be globally convergent when the line search fulfills the (strong) Wolfe conditions. Numerical experiments are done on a set of CUTEr unconstrained opti...
متن کاملA Note on the Descent Property Theorem for the Hybrid Conjugate Gradient Algorithm CCOMB Proposed by Andrei
In [1] (Hybrid Conjugate Gradient Algorithm for Unconstrained Optimization J. Optimization. Theory Appl. 141 (2009) 249 - 264), an efficient hybrid conjugate gradient algorithm, the CCOMB algorithm is proposed for solving unconstrained optimization problems. However, the proof of Theorem 2.1 in [1] is incorrect due to an erroneous inequality which used to indicate the descent property for the s...
متن کاملHandwritten Character Recognition using Modified Gradient Descent Technique of Neural Networks and Representation of Conjugate Descent for Training Patterns
The purpose of this study is to analyze the performance of Back propagation algorithm with changing training patterns and the second momentum term in feed forward neural networks. This analysis is conducted on 250 different words of three small letters from the English alphabet. These words are presented to two vertical segmentation programs which are designed in MATLAB and based on portions (1...
متن کامل